Sentence Compression For Automated Subtitling: A Hybrid Approach

نویسندگان

  • Vincent Vandeghinste
  • Yi Pan
چکیده

In this paper a sentence compression tool is described. We describe how an input sentence gets analysed by using a.o. a tagger, a shallow parser and a subordinate clause detector, and how, based on this analysis, several compressed versions of this sentence are generated, each with an associated estimated probability. These probabilities were estimated from a parallel transcript/subtitle corpus. To avoid ungrammatical sentences, the tool also makes use of a number of rules. The evaluation was done on three different pronunciation speeds, averaging sentence reduction rates of 40% to 17%. The number of reasonable reductions ranges between 32.9% and 51%, depending on the average estimated pronunciation speed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentence Compression For Automatic Subtitling

This paper investigates sentence compression for automatic subtitle generation using supervised machine learning. We present a method for sentence compression as well as discuss generation of training data from compressed Finnish sentences, and different approaches to the problem. The method we present outperforms state-of-the-art baseline in both automatic and human evaluation. On real data, 4...

متن کامل

On the Limits of Sentence Compression by Deletion

Data-driven approaches to sentence compression define the task as dropping any subset of words from the input sentence while retaining important information and grammaticality. We show that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way. We argue that this is partly due to the lack of appropriate evaluation material and estimate that a...

متن کامل

Is Sentence Compression an NLG task?

Data-driven approaches to sentence compression define the task as dropping any subset of words from the input sentence while retaining important information and grammaticality. We show that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way. We argue that part of this is due to evaluation issues and estimate that a deletion model is in fac...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Automated production of true-cased punctuated subtitles for weather and news broadcasts

Providing subtitling for multimedia content is a highly costly process. Any system aimed at automating at least part of this process may therefore yield significant economic benefits for content providers. In this paper, we present an integrated automatic system capable of automatically subtitling weather forecasts and news broadcasts. In this system, a number of different modules are stringed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004